This technical report briefly describes our JDExplore d-team's Vega v2 submission on the SuperGLUE leaderboard. SuperGLUE is more challenging than the widely used general language understanding evaluation (GLUE) benchmark, containing eight difficult language understanding tasks, including question answering, natural language inference, word sense disambiguation, coreference resolution, and reasoning. [Method] Instead of arbitrarily increasing the size of a pretrained language model (PLM), our aim is to 1) fully extract knowledge from the input pretraining data given a certain parameter budget, e.g., 6B, and 2) effectively transfer this knowledge to downstream tasks. To achieve goal 1), we propose self-evolution learning for PLMs to wisely predict the informative tokens that should be masked, and supervise the masked language modeling (MLM) process with rectified smooth labels. For goal 2), we leverage the prompt transfer technique to improve the low-resource tasks by transferring the knowledge from the foundation model and related downstream tasks to the target task. [Results] According to our submission record (Oct. 2022), with our optimized pretraining and fine-tuning strategies, our 6B Vega method achieved new state-of-the-art performance on 4/8 tasks, sitting atop the SuperGLUE leaderboard on Oct. 8, 2022, with an average score of 91.3.
translated by 谷歌翻译
很少有视觉识别是指从一些标记实例中识别新颖的视觉概念。通过将查询表示形式与类表征进行比较以预测查询实例的类别,许多少数射击的视觉识别方法采用了基于公制的元学习范式。但是,当前基于度量的方法通常平等地对待所有实例,因此通常会获得有偏见的类表示,考虑到并非所有实例在总结了类级表示的实例级表示时都同样重要。例如,某些实例可能包含无代表性的信息,例如过多的背景和无关概念的信息,这使结果偏差。为了解决上述问题,我们提出了一个新型的基于公制的元学习框架,称为实例自适应类别表示网络(ICRL-net),以进行几次视觉识别。具体而言,我们开发了一个自适应实例重新平衡网络,具有在生成班级表示,通过学习和分配自适应权重的不同实例中的自适应权重时,根据其在相应类的支持集中的相对意义来解决偏见的表示问题。此外,我们设计了改进的双线性实例表示,并结合了两个新型的结构损失,即,阶层内实例聚类损失和阶层间表示区分损失,以进一步调节实例重估过程并完善类表示。我们对四个通常采用的几个基准测试:Miniimagenet,Tieredimagenet,Cifar-FS和FC100数据集进行了广泛的实验。与最先进的方法相比,实验结果证明了我们的ICRL-NET的优势。
translated by 谷歌翻译
联合学习(FL)是一种新兴的分布式机器学习方法,可在分散的边缘设备上赋予原位模型培训。但是,多个同时进行的培训活动可能会超载资源受限的设备。在这项工作中,我们建议使用MUFL的智能多租户系统,以有效地协调和执行同时进行培训活动。我们首先将多租户FL的问题形式化,定义多租户FL场景,并引入一种香草多租户FL系统,该系统依次训练活动以形成基础线。然后,我们提出了两种优化多租户FL的方法:1)活动合并将培训活动合并为一个通过多任务架构的活动; 2)在训练巡回赛之后,将活动分开的活动将其分为小组,通过在活动中的活动之间采用亲和力,使团体内部的活动具有更好的协同作用。广泛的实验表明,MUFL的表现优于其他方法,而消耗的能量减少了40%。我们希望这项工作将激发社区进一步学习和优化多租户FL。
translated by 谷歌翻译
检测变压器最近显示出有希望的对象检测结果,并引起了越来越多的注意力。但是,如何开发有效的域适应技术来改善其跨域性能,尚不清楚和不清楚。在本文中,我们深入研究了这个主题,并从经验上发现,CNN骨架上的直接特征分布对齐仅带来有限的改进,因为它不能保证变压器中的域不变序列特征进行预测。为了解决这个问题,我们提出了一种新型的序列特征比对(SFA)方法,该方法是专门设计用于适应检测变压器的。从技术上讲,SFA由基于域查询的特征对齐(DQFA)模块和令牌特征对齐(TDA)模块组成。在DQFA中,一个新的域查询用于从两个域的令牌序列中汇总和对齐全局上下文。 DQFA分别在变压器编码器和解码器中部署时,降低了全局特征表示和对象关系中的域差异。同时,TDA在两个域中的序列中都对准令牌特征,从而分别降低了变压器编码器和解码器中局部和实例级特征表示中的域间隙。此外,提出了一种新型的两分匹配损失,以增强可鲁棒对象检测的特征可区分性。在三个具有挑战性的基准上进行的实验表明,SFA优于最先进的域自适应对象检测方法。代码已在以下网址提供:https://github.com/encounter1997/sfa。
translated by 谷歌翻译
Knowledge graph embedding (KGE), which maps entities and relations in a knowledge graph into continuous vector spaces, has achieved great success in predicting missing links in knowledge graphs. However, knowledge graphs often contain incomplete triples that are difficult to inductively infer by KGEs. To address this challenge, we resort to analogical inference and propose a novel and general self-supervised framework AnKGE to enhance KGE models with analogical inference capability. We propose an analogical object retriever that retrieves appropriate analogical objects from entity-level, relation-level, and triple-level. And in AnKGE, we train an analogy function for each level of analogical inference with the original element embedding from a well-trained KGE model as input, which outputs the analogical object embedding. In order to combine inductive inference capability from the original KGE model and analogical inference capability enhanced by AnKGE, we interpolate the analogy score with the base model score and introduce the adaptive weights in the score function for prediction. Through extensive experiments on FB15k-237 and WN18RR datasets, we show that AnKGE achieves competitive results on link prediction task and well performs analogical inference.
translated by 谷歌翻译
For Prognostics and Health Management (PHM) of Lithium-ion (Li-ion) batteries, many models have been established to characterize their degradation process. The existing empirical or physical models can reveal important information regarding the degradation dynamics. However, there is no general and flexible methods to fuse the information represented by those models. Physics-Informed Neural Network (PINN) is an efficient tool to fuse empirical or physical dynamic models with data-driven models. To take full advantage of various information sources, we propose a model fusion scheme based on PINN. It is implemented by developing a semi-empirical semi-physical Partial Differential Equation (PDE) to model the degradation dynamics of Li-ion-batteries. When there is little prior knowledge about the dynamics, we leverage the data-driven Deep Hidden Physics Model (DeepHPM) to discover the underlying governing dynamic models. The uncovered dynamics information is then fused with that mined by the surrogate neural network in the PINN framework. Moreover, an uncertainty-based adaptive weighting method is employed to balance the multiple learning tasks when training the PINN. The proposed methods are verified on a public dataset of Li-ion Phosphate (LFP)/graphite batteries.
translated by 谷歌翻译
In this tutorial paper, we look into the evolution and prospect of network architecture and propose a novel conceptual architecture for the 6th generation (6G) networks. The proposed architecture has two key elements, i.e., holistic network virtualization and pervasive artificial intelligence (AI). The holistic network virtualization consists of network slicing and digital twin, from the aspects of service provision and service demand, respectively, to incorporate service-centric and user-centric networking. The pervasive network intelligence integrates AI into future networks from the perspectives of networking for AI and AI for networking, respectively. Building on holistic network virtualization and pervasive network intelligence, the proposed architecture can facilitate three types of interplay, i.e., the interplay between digital twin and network slicing paradigms, between model-driven and data-driven methods for network management, and between virtualization and AI, to maximize the flexibility, scalability, adaptivity, and intelligence for 6G networks. We also identify challenges and open issues related to the proposed architecture. By providing our vision, we aim to inspire further discussions and developments on the potential architecture of 6G.
translated by 谷歌翻译
In this paper, we investigate the joint device activity and data detection in massive machine-type communications (mMTC) with a one-phase non-coherent scheme, where data bits are embedded in the pilot sequences and the base station simultaneously detects active devices and their embedded data bits without explicit channel estimation. Due to the correlated sparsity pattern introduced by the non-coherent transmission scheme, the traditional approximate message passing (AMP) algorithm cannot achieve satisfactory performance. Therefore, we propose a deep learning (DL) modified AMP network (DL-mAMPnet) that enhances the detection performance by effectively exploiting the pilot activity correlation. The DL-mAMPnet is constructed by unfolding the AMP algorithm into a feedforward neural network, which combines the principled mathematical model of the AMP algorithm with the powerful learning capability, thereby benefiting from the advantages of both techniques. Trainable parameters are introduced in the DL-mAMPnet to approximate the correlated sparsity pattern and the large-scale fading coefficient. Moreover, a refinement module is designed to further advance the performance by utilizing the spatial feature caused by the correlated sparsity pattern. Simulation results demonstrate that the proposed DL-mAMPnet can significantly outperform traditional algorithms in terms of the symbol error rate performance.
translated by 谷歌翻译
Domain adaptation methods reduce domain shift typically by learning domain-invariant features. Most existing methods are built on distribution matching, e.g., adversarial domain adaptation, which tends to corrupt feature discriminability. In this paper, we propose Discriminative Radial Domain Adaptation (DRDR) which bridges source and target domains via a shared radial structure. It's motivated by the observation that as the model is trained to be progressively discriminative, features of different categories expand outwards in different directions, forming a radial structure. We show that transferring such an inherently discriminative structure would enable to enhance feature transferability and discriminability simultaneously. Specifically, we represent each domain with a global anchor and each category a local anchor to form a radial structure and reduce domain shift via structure matching. It consists of two parts, namely isometric transformation to align the structure globally and local refinement to match each category. To enhance the discriminability of the structure, we further encourage samples to cluster close to the corresponding local anchors based on optimal-transport assignment. Extensively experimenting on multiple benchmarks, our method is shown to consistently outperforms state-of-the-art approaches on varied tasks, including the typical unsupervised domain adaptation, multi-source domain adaptation, domain-agnostic learning, and domain generalization.
translated by 谷歌翻译
Collaboration among industrial Internet of Things (IoT) devices and edge networks is essential to support computation-intensive deep neural network (DNN) inference services which require low delay and high accuracy. Sampling rate adaption which dynamically configures the sampling rates of industrial IoT devices according to network conditions, is the key in minimizing the service delay. In this paper, we investigate the collaborative DNN inference problem in industrial IoT networks. To capture the channel variation and task arrival randomness, we formulate the problem as a constrained Markov decision process (CMDP). Specifically, sampling rate adaption, inference task offloading and edge computing resource allocation are jointly considered to minimize the average service delay while guaranteeing the long-term accuracy requirements of different inference services. Since CMDP cannot be directly solved by general reinforcement learning (RL) algorithms due to the intractable long-term constraints, we first transform the CMDP into an MDP by leveraging the Lyapunov optimization technique. Then, a deep RL-based algorithm is proposed to solve the MDP. To expedite the training process, an optimization subroutine is embedded in the proposed algorithm to directly obtain the optimal edge computing resource allocation. Extensive simulation results are provided to demonstrate that the proposed RL-based algorithm can significantly reduce the average service delay while preserving long-term inference accuracy with a high probability.
translated by 谷歌翻译